A Greedy Algorithm for Aligning DNA Sequences

نویسندگان

  • Zheng Zhang
  • Scott Schwartz
  • Lukas Wagner
  • Webb Miller
چکیده

For aligning DNA sequences that differ only by sequencing errors, or by equivalent errors from other sources, a greedy algorithm can be much faster than traditional dynamic programming approaches and yet produce an alignment that is guaranteed to be theoretically optimal. We introduce a new greedy alignment algorithm with particularly good performance and show that it computes the same alignment as does a certain dynamic programming algorithm, while executing over 10 times faster on appropriate data. An implementation of this algorithm is currently used in a program that assembles the UniGene database at the National Center for Biotechnology Information.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identifying DNA and protein patterns with statistically significant alignments of multiple sequences

MOTIVATION Molecular biologists frequently can obtain interesting insight by aligning a set of related DNA, RNA or protein sequences. Such alignments can be used to determine either evolutionary or functional relationships. Our interest is in identifying functional relationships. Unless the sequences are very similar, it is necessary to have a specific strategy for measuring-or scoring-the rela...

متن کامل

A new greedy randomised adaptive search procedure for multiple sequence alignment

The Multiple Sequence Alignment (MSA) is one of the most challenging tasks in bioinformatics. It consists of aligning several sequences to show the fundamental relationship and the common characteristics between a set of protein or nucleic sequences; this problem has been shown to be NP-complete if the number of sequences is >2. In this paper, a new incomplete algorithm based on a Greedy Random...

متن کامل

gpALIGNER: A Fast Algorithm for Global Pairwise Alignment of DNA Sequences

Bioinformatics, through the sequencing of the full genomes for many species, is increasingly relying on efficient global alignment tools exhibiting both high sensitivity and specificity. Many computational algorithms have been applied for solving the sequence alignment problem. Dynamic programming, statistical methods, approximation and heuristic algorithms are the most common methods appli...

متن کامل

An Iterated Greedy Algorithm for Solving the Blocking Flow Shop Scheduling Problem with Total Flow Time Criteria

In this paper, we propose an iterated greedy algorithm for solving the blocking flow shop scheduling problem with total flow time minimization objective. The steps of this algorithm are designed very efficient. For generating an initial solution, we develop an efficient constructive heuristic by modifying the best known NEH algorithm. Effectiveness of the proposed iterated greedy algorithm is t...

متن کامل

Development of an Efficient Hybrid Method for Motif Discovery in DNA Sequences

This work presents a hybrid method for motif discovery in DNA sequences. The proposed method called SPSO-Lk, borrows the concept of Chebyshev polynomials and uses the stochastic local search to improve the performance of the basic PSO algorithm as a motif finder. The Chebyshev polynomial concept encourages us to use a linear combination of previously discovered velocities beyond that proposed b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of computational biology : a journal of computational molecular cell biology

دوره 7 1-2  شماره 

صفحات  -

تاریخ انتشار 2000